The automation of bioinformatics processes through workflow management systems
نویسندگان
چکیده
Extended abstract There is a huge and increasing amount of biological information distributed over the Internet. The data structure and contents of data banks are extremely heterogeneous. A lot of bioinformatics software tools are also available on the net through many different interfaces. This implies that nowadays data integration on the network is a must for biomedical research [1]. Current integration tools, mainly represented by data warehouses and integration software (like SRS [4]), have strong limitations due to the quantity of data to be managed and the frequency of related updates affecting both the data itself and the structure of databases. This makes the adoption of new flexible and extensible data integration and analysis network tools a necessity in bioinformatics. Information and Communication Technology (ICT) standards and tools, like XML, Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. An effort should also be done to the adherence to standards and the use of open source, that is increasingly adopted for the development of new tools [2]. Many XML languages and Web Services for bioinformatics have already been designed and implemented, together with development and support tools [4,7]. Consequently, some WMS have been proposed [5,6] and are now under careful testing aimed at the verification of their actual ability to cope with the data integration issue. While their potentiality is clear, some limitations are now arising. These include both network issues (e.g., quality of service, speed, access restrictions) and practical issues (e.g., long running jobs, huge input/output). WMS assume that researchers know which bioinformatics resources can be reached through a programmatic interface. They also presume that users are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. Portals enabling these to take profit from new technologies by executing predefined workflows in a user-friendly environment can therefore be useful and are also under development and testing [8]. In this paper, we present a methodology for data integration in biomedical research that is based on the adoption of XML and Web Services for data interchange, on WMS for the automation of processes and on portals for their exploitation by interested researchers. We shortly describe available WMS, by comparing them on the basis of their functions, adherence to standards and availability. We finally introduce the biowep workflow enactment portal for bioinformatics [8] and discuss about current limitations of this methodology and perspectives for overcoming them.
منابع مشابه
Automation of in-silico data analysis processes through workflow management systems
Data integration is needed in order to cope with the huge amounts of biological information now available and to perform data mining effectively. Current data integration systems have strict limitations, mainly due to the number of resources, their size and frequency of updates, their heterogeneity and distribution on the Internet. Integration must therefore be achieved by accessing network ser...
متن کاملTool Support for Dynamic Development Processes
Development processes in engineering disciplines are highly dynamic. Since development projects cannot be planned completely in advance, the process to be executed changes at run time. We present a process management system which seamlessly integrates planning and enactment. The system manages processes at the project management level, but goes beyond the functionality of project management sys...
متن کاملGuided Composition of Tasks with Logical Information Systems - Application to Data Analysis Workflows in Bioinformatics
In a number of domains, particularly in bioinformatics, there is a need for complex data analysis. For that issue, elementary data analysis operations called tasks are composed as workflows. The composition of tasks is however difficult due to the distributed and heterogeneous resources of bioinformatics. This doctorial work will address the composition of tasks using Logical Information System...
متن کاملWorkflow Management and Databases
Workflow management systems are among the most interesting concepts for supporting modern organizations with a focus on processes rather than on structure. Workflow management systems offer different degrees of automation of business processes. We classify workflow management systems according to the features they provide and the types of processes they support. Database systems facilitate the ...
متن کاملPegasus, a workflow management system for science automation
Modern science often requires the execution of large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable workflow management systems that are able to reliably and efficiently coordinate and automate data movement and task execution on distributed computational resources...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006